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1.  INTRODUCTION 

Several  authors  have  examined  the  estimation  of  the  proportions  px,  p2,  ...  ,pm  in  the  mixture 

density 

K*)  =  PlfM  +  P2/2(x)  +  +  pmfm(x)  (1.1) 

where  the  component  densities  are  specified  as  belonging  to  some  parametric  family,  usually  the 
normal.  Hasselblad  (1966),  Day  (1969),  Hosmer  (1973),  Fowlkes  (1979),  and  Redner  and  Walker 
(1984)  have  examined  the  use  of  maximum  likelihood  (ML)  estimation  of  the  parameters  in  (1.1)  under 
the  assumption  that  the  component  distributions  are  normal.  Woodward  et.  al.  (1984)  investigated  the 
use  of  minimum  distance  estimation  based  on  a  mixture-of- normals  projection  family  and  using 
Cramer-von  Mises  distance  as  an  alternative  to  maximum  likelihood.  We  denote  estimates  obtained 
in  this  manner  as  MCVMD  estimates.  They  were  able  to  show  that  the  MCVMDE  is  more  robust 
than  the  MLE  to  symmetric  departures  from  the  component  normality  such  as  the  double  exponential, 
1(4),  and  1(2)  distributions.  Not  surprisingly,  however,  the  MLE  was  shown  to  be  superior  to  the 
MCVMDE  when  the  components  were  normal. 

Intuitively,  robust  procedures  are  those  which  are  insensitive  to  small  deviations  from  the 
assumptions.  Donoho  and  Liu  (1988)  have  shown  that  the  class  of  minimum  distance  estimators  has 
“automatic”  robustness  properties  over  neighborhoods  of  the  true  model  based  on  the  distance 
functional  defining  the  estimator.  However,  robust  procedures  such  as  minimum  distance  estimators 
typically  obtain  this  robustness  at  the  expense  of  not  being  optimal  at  the  true  model.  In  fact,  Bickel 
(1978)  describes  robustness  as  “paying  a  price  in  terms  of  efficiency  at  the  (true)  model  in  terms  of 
reasonably  good  maximum  MSE  over  the  neighborhood.”  The  behavior  of  the  MCVMDE  described 
above  is  a  good  example  of  this  trade-off.  However,  Beran  (1977)  has  suggested  the  use  of  the 
minimum  Hellinger  distance  (MHD)  estimator  which  has  certain  robustness  properties  and  is 
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asymptotically  efficient  at  the  true  model.  Although  Beran  suggested  a  computational  procedure  for 
evaluating  the  MHDE,  he  provided  very  limited  empirical  evidence  concerning  its  performance  as  an 
estimator.  Eslinger  and  Woodward  (1990)  investigated  the  use  of  the  MHDE  for  estimation  of  the 
parameters  of  the  normal  distribution  with  unknown  location  and  scale.  They  demonstrated  the 
practical  feasibility  of  employing  the  MHDE  in  the  normal  setting  and  demonstrated  empirical 
robustness  far  outside  Bellinger  neighborhoods  of  the  true  model,  and  also  demonstrated  the  true  model 
efficiency  properties  shown  theoretically  by  Beran.  Tamura  and  Boos  (1986)  have  investigated  the 
performance  of  the  MHDE  in  the  estimation  of  location  and  covariance  in  multivariate  data.  The 
empirical  findings  of  Eslinger  and  Woodward  and  of  Tamura  and  Boos  indicate  that  the  MHDE  is  an 
attractive  estimator. 

In  this  paper  we  examine  the  use  of  MHD  estimation  in  the  mixture  of  two  normals  whose 
density  is  given  by 


where  0  =  (/ilt  <r1,  /i2,  <r3,  p)'.  We  will  let  p(H)  and  p(L)  denote  the  MHD  and  ML  estimates  of  the 
parameter  p.  In  Section  2  we  provide  background  material  on  the  MHDE.  In  Section  3  we  discuss  its 
application  to  (1.2)  where  p  is  unknown  and  the  remaining  parameters  are  known  while  in  Section  4  we 
investigate  the  case  in  which  all  five  parameters  are  unknown. 

2.  THE  MINIMUM  HELLINGER  DISTANCE  ESTIMATOR 

Let  Xj,  X2,  .  .  .  ,  Xn  denote  a  random  sample  from  some  unknown  population  with 
distribution  function  G.  Further,  let  7  =  (F^;  0<0},  be  a  family  of  distributions,  called  the  projection 
family  or  projection  model,  depending  on  the  (possibly  vector  valued)  parameter  6.  We  will  assume 
here  that  the  distributions  in  7  are  mixtures  of  normals  with  densities  of  the  form  (1.2).  A  minimum 
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distance  estimator  of  6  is  a  value  &  which  minimizes  the  distance  between  the  data  distribution  and  the 
projection  model,  usually  by  minimizing  the  “distance”  between  Fg  and  Gn  where  Gn  is  the  empirical 
distribution  function 


Gn(t)  =  k  Z  Wi  <  0,  (2-1) 

«=1 

where  I  denotes  the  indicator  function.  For  example  the  MCVDE  is  obtained  by  using  Cramer-von 
Mises  distance,  w2,  which  for  distribution  functions  Qi  and  Q7  is  given  by 


W5(<?1,  Q2)  =  J  m*)  -  <w*)]2  dQ3(z)  ,  (2.2) 

—  OO 


to  compute  the  distance  between  Fg  and  Gn. 

The  Hellinger  distance  between  two  absolutely  continuous  distributions  with  distribution 

l  l 

functions  and  Q7  is  defined  to  be  ||  q\  —  q\  ||  where  qx  and  q2  are  the  corresponding  densities  and 
the  notation  ||  •  ||  denotes  the  usual  L3  norm,  i.e. 


II  ^  II  = 


(2.3) 


where  the  integration  is  with  respect  to  Lebesgue  measure  on  the  real  line.  The  MHD  estimator  of  0  is 

l  l 

defined  as  a  value  of  0^  which  minimizes  \\  fg  —  g\  ||  where  gn  is  a  suitable  nonparametric  density 
estimator.  We  use  the  kernel  density  estimator 


*»<•>  -  sfc  £  “(t?) 


(2.4) 


based  on  the  Epanechnikov  (1969)  kernel  u>(z)  =  .75(1  — z3)  for  |  r  |  <  1. 
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Parzen  (1962)  found  the  hn  which  minimizes  the  integrated  mean  square  error  between  a 
kernel  density  estimator  and  the  true  density  g.  The  optimal  An  in  this  sense  is  hn  —  a(w)  0(g)  n 
where 


<(„)  = 

[/  «o) 


(2.5) 


and 


0(g)  = 


/(»)’* 


,-i/# 


(2.6) 


For  the  Epanechnikov  kernel  a(w)  =  1.71877,  and  when  g(x)  is  a  J*f(/i,<r3)  density,  i.e.  with  mean  fi 
and  variance  <r3,  then  0(g)  =  1.3640’.  A  natural  implementation  of  the  Epanechnikov  kernel  density 
estimate  is  to  use  hn  =  (1.71877)(1.364sn)n  where  sn  is  an  estimate  of  scale.  In  the  case  in  which 
g(x)  is  a  mixture  of  normals  as  in  (1.2),  J ^  "  given  by 

l{W)  *  =  /{s$5  "f/2) 


+  jTfM*  >  2)  (”  -  O’ 


.  2p(l -p)  +  n, 

+  277T^  ( 1  ■  )( 2 


-l)jdx 


(2.7) 


where  Zj  =  h  =  ~y~^i  and  <fr(r,  ft,  it2)  denotes  the  normal  density  function  with  mean  ft  and 

variance  <72.  In  our  implementation  we  used  hn  =  1.71877  0(g) n  where  0(g)  was  obtained  using 
numerical  integration  to  approximate  the  integral  in  (2.7).  From  (2.7)  it  is  seen  that  in  this  setting, 
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0(g)  depends  on  all  five  of  the  mixture  model  parameters  rather  than  simply  being  a  function  of  scale 
as  in  the  univariate  normal  setting. 

3.  MHD  ESTIMATION  WHEN  ONLY  p  IS  UNKNOWN 

(a)  Theoretical  Results 

As  a  first  step  in  examining  the  use  of  the  MHOE  in  the  mixture-of-normals  setting,  we 
consider  the  case  in  which  fg(x)  is  given  by  (1.2)  and  only  p  is  unknown.  In  Theorems  3.1  and  3.2  we 
provide  conditions  for  which  the  MHD  estimator  in  this  setting  is  consistent  and  asymptotically 
normal.  The  consistency  of  the  MHDE  follows  from  the  Hellinger  consistency  of  the  kernel  density 
estimator  together  with  the  equivalence  of  the  Hellinger  metric  on  the  probability  distributions  and  the 
Euclidean  metric  on  the  parameter  space,  see  Theorem  3  in  Beran  (1977)  or  Theorem  3.1  in  Tamura 
and  Boos  (1986).  In  this  section  the  Tamura  and  Boos  paper  will  be  referred  to  as  TB.  Either  of  these 
theorems  implies  the  following: 

Theorem  3.1.  Let  fg(x)  =  9fi(z)  +  (1  —  $)f2(z),  where  and  /a  are  distinct,  continuous  densities  on 
R,  and  let  9  e  [0,1]  =  6.  If  gn  is  Hellinger  consistent,  then  the  MHDE  is  consistent. 


The  asymptotic  distribution  of  9n  is  described  in  the  next  theorem,  which  is  a  consequence  of 
TB’s  Theorem  4.1. 

Theorem  3.2.  Let  fg(x)  be  as  in  Theorem  3.1,  and  let  9  e  (0,1)  C  [0,1]  =  9.  Denote  by  9n  the  MHDE 
of  9  based  on  a  random  sample  of  size  n  from  a  population  with  density  fg.  Also  suppose: 

1-  /M*/i(*)<fc<  oo  and  f\x\kf2(x)dx  <  oo  for  every  4r  >  0. 

2.  lim  fox)  =  0,  i  =  1,2. 

1*1— 'oo 
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3.  fi  and  /2  satisfy  Condition  5  from  TB’s  Theorem  4.1. 

4.  The  bandwidth  for  the  kernel,  h„,  satisfies  hn  =  an~c  for  some  e  e  (0,  1/4)  and  a>0. 

Then  Tn(0„  —  9  —  Bn)  >f(0,  I(9)~l),  where  1(8)  is  the  information  matrix  and  Bn  is  given  by 


where  E[j„]  =  gn. 

As  a  result  of  Theorem  3.2  we  see  that  9n  is  asymptotically  fully  efficient.  Our  utilization  of 
these  results  will  be  to  the  case  in  which  is  the  mixture  of  normals  in  (1.2)  with  <tu  /r2,  and  <r2 
known,  and  as  mentioned  earlier,  we  will  use  the  notation  p(H)  for  9n. 

(b)  Implementation  Details 

The  estimates  may  be  obtained  by  minimizing  -If/-  over  0  e  [0,1].  This  minimization  was 
performed  using  a  golden  section  search  as  described  in  Press,  et.  a.  (1986).  The  starting  values  for  this 
optimization  were  obtained  by  examining  the  values  of  the  integral  over  a  grid  of  0's  on  [0,1];  the 
optimization  routine  was  always  started  in  an  interval  which  contained  the  global  minimum  of  the 
quantity  over  the  grid  values.  The  integral  was  estimated  using  Simpson’s  rule  with  a  mesh  of  201 
points  over  the  support  of  gn.  The  bandwidth  of  the  estimate  gn  was  obtained  by  plugging  into  (2.7) 
the  known  /ix,  <ru  fi2,  and  <r2  along  with  the  mixing  proportion  estimated  by  the  quasi-clustering 
technique  in  Woodward  et.  al.  (1984). 

(c)  Simulation  Results 

Simulations  were  run  in  order  to  examine  empirically  the  theoretical  results  of  this  section 
using  the  parameter  configurations  employed  by  Woodward,  et.  al.  (1984).  Simulations  reported  in 
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this  section  and  the  next  are  based  on  mixing  proportions  .25,  .5  and  .75.  For  each  of  these  mixing 
proportions,  we  considered  mixtures  of  the  densities  /t(z)  and  /2(z)  where  fx{x)  is  the  density  for  the 
random  variable  X  =  aY  and  /3(x)  is  the  density  associated  with  X  =  Y  +  b  where  a>0  and  6>0. 
Thus,  a  is  the  ratio  of  scale  parameters  which  we  take  to  be  1  and  V2  while  b  was  selected  to  provide 
the  desired  overlap  between  the  two  distributions.  We  considered  “overlaps”,  as  defined  by  Woodward, 
et.  al.  (1984)  of  .03  and  .1.  In  this  section  we  consider  the  case  in  which  Y  is  normally  distributed. 
For  each  set  of  configurations  considered,  500  samples  of  size  n=100  were  generated  from  the 
corresponding  mixture  distribution,  and  for  each  sample  considered,  the  ML  and  MHD  estimates  were 
obtained.  In  Table  3.1  we  present  the  results  of  the  simulations,  showing  simulation-based  estimates  of 
the  bias  and  MSE  given  by 


Bias  -h  i(Pi  -  P) 

t  =  l 

MSE  =  l  E(Pi  -  P)3 

t=l 

where  n(  denotes  the  number  of  samples  (500  in  our  case)  and  pi  denotes  an  estimate  of  p  for  the  sth 
sample.  In  the  tables  we  report  nMSE  where  n  is  the  size  of  each  sample  (a  =  100  in  our  case),  and  in 
all  cases,  an  approximate  standard  error  of  a  tabled  nMSE  is  (.0632)(nMSE).  We  also  table  empirical 
measures  of  the  relative  efficiencies  of  the  MHDE  with  the  MLE,  i.e. 

^  _  MSE  (MLE) 

"  MSE  (MHDE)  ' 

Examination  of  the  table  shows  that  the  asymptotic  full  efficiency  with  respect  to  the  MLE  guaranteed 
by  Theorem  3.2  holds  approximately  in  the  current  setting  with  n=100  as  evidenced  by  the  fact  that 
all  E  values  are  near  1.  In  Figure  1  we  show  a  normal  probability  plot  of  p,(H)  and  p,(L),  *=!,..., 
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500,  obtained  in  the  simulation  for  the  case  p  =  .25,  ratio  of  scale  parameters  =  1  and  overlap  =  .1. 
There  it  can  be  seen  that  the  sampling  distribution  for  each  estimator  closely  approximates  a  normal 
curve. 

It  should  be  noted  that  the  asymptotic  result  in  Theorem  3.2  is  for  the  case  in  which  the 
bandwidth  kn  is  nonstochastic.  In  our  implementation  this  bandwidth  is  random  since  it  depends  on 
the  starting  value  estimate  of  the  parameter  p.  The  simulations  indicate  that  the  results  hold  in  this 
case. 


4.  MHD  ESTIMATION  WHEN  j>,  pv  /i2  AND  <r2  ARE  UNKNOWN 

We  consider  in  this  section  the  case  in  which  the  five  parameters  p ,  <rlt  /j2  and  <r2  in  (1.2) 
are  all  unknown,  and  we  will  again  compare  the  MHD  estimators  with  maximum  likelihood.  It  is  well- 
known  that  the  likelihood  function  is  not  bounded  in  this  case  (see  Day  1969),  and  thus  “ML” 
estimators  in  this  setting  are  obtained  by  finding  an  appropriate  local  maximum.  We  will  empirically 
compare  the  MHD  and  ML  estimators  in  this  setting  using  a  large-scale  simulation  analysis  in  which 
we  examine  the  efficiency  and  robustness  of  the  estimators. 

(a)  Implementation  Details 

l  i 

Since  minimizing  ||  f$  —  g\  ||  is  equivalent  to  maximizing 

ffg  9n  »  (4.1) 


Beran  (1977)  and  Eslinger  and  Woodward  (1990)  obtained  MHD  estimates  by  using  Newton’s  method 
to  maximize  (4.1).  One  advantage  of  this  approach  is  the  fact  that  gn  is  zero  outside  a  finite  interval, 
simplifying  the  integration  in  (4.1).  Woodward  and  Eslinger  (1983)  investigated  the  corresponding  use 
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of  Newton’s  method  in  the  mixture-of-normals  case  with  starting  values  for  the  iteration  being  obtained 
using  the  quasi-clustering  technique  discussed  by  Woodward  et.  ai.  (1984)  for  obtaining  starting  values 
of  the  mixture  model  parameters.  However,  they  found  that  Newton’s  method  in  this  setting  often 
failed  to  converge  to  reasonable  estimates,  with  convergence  occurring  in  less  than  80%  of  the 
simulated  samples  from  some  configurations.  Since  the  MHDE,  is  defined  to  be  a  value  which 
minimizes  the  integral 


we  approximated  this  integral  using  the  trapezoidal  rule  to  obtain 

‘  =  a‘,  i  ‘ i  (d  (<<)  -  si  (<())  <«> 

where  ax  =  =  1  and  a-  =  1  for  i  =  2,  3,  .  .  .  ,  k—  1  for  a  partition  ix,  t2,  .  .  .  ,  of  [a, 4],  a  finite 

interval.  In  our  case  we  took  k  =  200  and  [a, 6]  to  be  the  interval  [X^  —  3,  X(n)  +  3]  where  *0) 
denotes  the  jth  order  statistic.  The  procedure  employed  was  to  minimize  the  sum-of-squares  in  (4.3) 
using  IMSL  routine  ZXSSQ  which  utilizes  the  Marquardt-Levenberg  algorithm  (1963).  Using  this 
procedure,  the  MHD  estimates  converged  in  at  least  97.8%  of  the  samples  for  each  configuration 
considered.  In  the  simulations,  if  convergence  to  “reasonable”  values  was  not  obtained,  .’•e  starting 
values  were  used  as  the  corresponding  estimates.  Specifically,  if  any  of  the  conditions  &x  >  X^  — 
X(i>,  <x2  >  —  X^,  <  X(j)  —  (X^  —  X(1))/I0  or  \i 2  >  -f  (X(n)  —  X^^/IO  for  any 

estimate,  the  corresponding  estimate  was  taken  to  be  the  starting  value.  The  kernel  density  estimate 
gn  was  obtained  using  the  Epanechnikov  kernel.  In  this  case  0(g)  was  obtained  by  substituting  the 
starting  values  for  <rlt  /i2,  <r2,  and  p  into  (2.7)  and  theu  performing  the  required  integration 
numerically. 
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(b)  Simulation  Results 

The  MLE  and  MHDE  estimates  were  examined  using  simulations  based  on  the  basic 
framework  used  in  Section  3,  i.e.  we  considered  the  same  mixing  proportions,  ratios  of  scale  parameters 
and  overlaps  as  considered  there.  As  before,  500  samples  of  size  n=100  were  generated  from  the 
corresponding  mixture  distributions,  and  we  considered  the  cases  in  which  the  simulated  component 
densities  were  normal,  1(4)  and  1(2).  For  each  sample  considered,  we  computed  the  ML,  MHD  and 
MCVMD  estimates  initialized  employing  the  quasi-clustering  technique  used  by  Woodward  et.  al. 
(1984).  In  Table  4.1  the  simulation  results  for  simulated  mixtures  of  normal  distributions  indicate  that 
again,  as  in  the  results  of  Section  3,  the  MHDE  appears  to  obtain  full  efficiency  at  the  true  model  as 
evidenced  by  E  near  one  in  all  cases.  However,  the  MCVMD  estimators  had  larger  MSE’s  than  did  the 
MLE  in  9  of  the  10  cases  with  some  of  the  efficiencies  near  .5.  In  Table  4.2  we  show  similar  results  for 
samples  which  were  simulated  as  mixtures  of  1(4)  components.  All  of  the  E’s  in  this  table  are  greater 
than  one  providing  evidence  that  the  MCVMDE  and  MHDE  are  more  robust  to  the  departures  from 
the  assumption  of  normal  components  than  is  the  MLE.  Also,  comparison  of  the  MSE’s  for  the  MHD 
and  MCVMD  estimators  indicate  that  the  robustness  of  the  MHDE  is  comparable  to  that  of  the 
MCVMDE  in  this  setting.  In  Table  4.3  we  briefly  consider  the  case  in  which  the  component 
distributions  are  1(2),  i.e.  the  departure  from  normality  is  more  extreme.  In  this  setting  the 
performance  of  the  MLE  further  deteriorates  with  respect  to  that  of  the  two  minimum  distance 
estimators. 

Although  theoretical  results  similar  to  Theorem  3.1  and  3.2  have  not  been  shown  in  this  case, 
the  simulation  results  suggest  that  such  results  hold.  Although  our  emphasis  here  has  been  on  the 
estimation  of  the  mixing  proportion,  p,  the  ML  and  MHD  routines  used  here  obtain  estimates  for  all 
five  of  the  parameters  in  (1.2).  The  results  for  location  and  scale  parameters  are  similar  to  those 
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shown  here  for  the  mixing  proportion  when  sampling  from  normal  mixtures.  In  the  case  of  simulations 
from  the  non-normal  components  considered  here,  the  results  for  the  location  parameters  also  exhibited 
patterns  similar  to  those  shown  in  Tables  4.2-4.3.  However,  the  scale  estimates  obtained  by  all  three 
estimation  methods  often  exhibited  substantial  bias  in  these  non-normal  cases. 

5.  CONCLUDING  REMARKS 

Our  results  indicate  that  the  MHDE  obtains  full  efficiency  at  the  true  model  while  performing 
comparably  with  the  MCVMDE  under  the  symmetric  departures  from  component  normality 
considered.  Thus,  the  MHDE  is  a  very  attractive  alternative  to  both  the  MLE  and  the  MCVMDE  in 
these  settings. 

The  computation  of  the  MHDE  in  this  setting  is  quite  straightforward,  yet  in  the  cases 
considered  here,  it  took  from  1.5  to  5  times  longer  to  calculate  than  the  MLE  and  about  2.5  times 
longer  than  the  MCVMDE.  However,  Eslinger  and  Woodward  (1990)  have  shown  that  for  very  large 
sample  sizes,  the  MHDE  can  be  faster  to  compute  than  competing  estimators  because  of  the  fact  that  it 
requires  only  one  pass  through  the  data  to  evaluate  the  kernel  density  estimator  at  the  appropriate  grid 
points  for  numerical  integration. 

As  would  be  expected,  the  performance  of  the  estimators  declines  as  the  overlap  between  the 
two  components  increases.  The  sensitivity  to  overlap  was  more  extreme  in  the  case  in  which  all  five 
parameters  are  unknown  since  the  location  and  scale  of  each  component  must  then  be  estimated  from 
the  data.  Estimation  in  the  case  in  which  all  five  parameters  are  unknown  can  be  a  difficult  problem 
when  there  is  not  substantial  separation  between  the  components.  In  our  simulations,  the  estimators 
were  quite  poor  at  .1  overlap  when  all  five  parameters  were  estimated.  In  fact,  in  these  cases  the 
starting  values  often  outperformed  the  maximum  likelihood  and  minimum  distance  estimators.  This 
behavior  has  been  previously  observed  by  Woodward,  et.  al.  (1984)  and  Woodward  and  Gunst  (1987). 
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APPENDIX 


Proof  of  Theorem  3.2:  The  proof  proceeds  by  verifying  conditions  1-7  of  TB’s  Theorem  4.1.  We  begin 
by  setting  up  the  notation. 

Let  /j,  /3  be  continuous  densities  on  R,  and  for  9  e  [0,  1]  =  0  let  fg  =  0/j  +  (1— 0)/2,  so  that  fg 
is  a  simple  mixture  of  fy  and  /3.  As  in  TB,  we  let 

3  9  =  Uo  ' 

3 9  ~  §e  a9  ~  5  A?  (A  “A)  4upp  fg  ’ 

3 6  =  “«  A?  3/  4upp  fg  ’ 

and 

=  -[/  **<»>  4/’<»  *]  Wp  /„(•) 

-  JL  A(«)-/a(«)  r 
W  /*(*)  SUPP  /<?(*)  ’ 

where  7^^  denotes  the  indicator  function  of  the  support  of  f(x)  and  where  1(9)  is  the  Fisher 
Information  which  is  in  this  case  equal  to 


/(A -A)5 

J-tt 


Note  that  1(9)  >  0  if  /t  and  /2  are  not  equal.  Also,  if  9  c(0,l),  1(9)  <  oo,  since 
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=  {hr±s)- 


Finally,  note  that  we  often  drop  the  constant  a  in  the  bandwidth  hn  =  an  e;  this  does  not  effect  the 
result. 

TB.l  The  conditions  on  the  kernel  are  satisfied  by  the  Epanechnikov  kernel  (symmetric,  compact 
support  =  [—1,  1]).  Our  condition  on  kn  implies  nhn  —*  oo  and  hn  — *  0. 

TB.2  Condition  2.b  holds:  0  =  [0,  1]  is  compact,  fg(x)  is  continuous  in  9  and  0X  ^  9 3  => 
on  a  set  of  positive  Lebesgue  measure. 

TB.3  Let  an  =  AJf 1  and  let  X  ~  fg,  Xj  ~  /j  and  Xj  ~  /x.  Then  for  i  e  [—  1,  1]  , 

n  Probfl  {|  X  -  hnt  \  >  aB) 

=  n9  Proby.  ||  X^  -  hni  |  >  on} 

+  n(l -9)  Probya  {  \  ~  I  >  an} 

<n0Efi\  Xfi  -  hnt  |  */4 

+  n(l  —  9)  |  Xfe—  hnt  |  */Qn  • 
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Since  A„  •—  0  as  n  —  oo  and  E  |  X  —  hnt  |*  is  a  continuous  function  of  hnt  (this  is 
to  see  for  k  an  even  integer),  E^J  —  hnt  |*  and  E^|  —  hnt  |*  — »  E^  |  X^  \ 

respectively,  uniformly  over  t  e  [—1,1]  •  Thus, 


n  sup  Prob/j{|  X  —  hnt  |  >  on|  <  0  fn  a„  . 

t«[-l,l]  1 


A  choice  of  k  can  be  made  so  that  nan 


-k 


0  as  n  — *■  oo  . 


TB.4  We  examine 


c— 

<  n 


1 

2 


fl 

T0 


<  n 


c~  5 


2ncl  + 


2nc 


_L 

1-9  ’ 


This  converges  to  zero  since  0  <  c  < 


sup  sup 

x  |  <  nc  t  <  [-U3 


fg(x  +  A»<) 
/*(*)  “ 


=  0(1)  . 


particularly  easy 
‘  "<•  E/jl  XA|‘, 


TB.5  We  must  show 
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Note 


fg(x  +  An<)  _  ^  /t(x  +  hnt)  ,j_0v  f2(z  +  hnt) 

f6(*)  /#(*)  V  /*(*) 

s  /i(r  +  r  ,  A(g  ^nO  r 

/x(x)  supp  A(x)  +  /2(x)  supp  /2(*)’ 


The  result  follows  from  our  Condition  3. 


TB.6 


1. 


=  I{0)  1  <  oo,  since  fx  ^  /2. 


2. 


/  + «)  /»(*)  * = w’  /  b 


<  ks)~3  J  (l  +  nhs  j  /$(*) <  oc> 


independent  of  a. 


3-  J(*0(x  +  «)  “  Vr))  V*)  * 


=  w2 


/l(x-fa)  -  /2(r+a) 
f9(z+a)/f0(z)l/* 


^)V) 


dx 


17 


.  ,1  r  M(*+«)  -  /a(*+«)  A(r)  -  A(*)\  .,2  f  rr 

*  m  11  fo(x)  \ - ffi+7) - 7J7)  )  N~  J  w 


The  integral  / <  oo  since  the  tails  of  fx  and  /2  (and  so  fg)  decrease  faster  than  n  for  any  b  >  0. 
Thus,  we  need  to  show  the  Loo  norm  goes  to  0  as  a  —*  0.  To  see  this,  note  first  that  both 


l  A(*+a)  -  A(*+a)  i^l,  _J_ 

*  /^(z+q)  -  0 + 1—0 


,  A(*)  -  /a(*)  .  ^  1  .  1 

1  /„(«)  1  s  5  +  1=5  ' 


i  i  s  ( J  +  rh )  V-r  -  • 


as  |  1 1  — ♦  oo  by  Condition  2.  Given  f  >  0  3  M  >  0  such  that  V  |  z  |  >  M  and  any  o  e  R, 


A(«+  a)  ~  A(*+a)  ,  ,  ,  *»/« 


/#(*+<») 


I  /„(*)  '  <«A 


For  |  z  |  <  A/, 


A(*+a)  -  /2(r+q) 

/g(*+a) 


_  A(«)  ~  /a(»A  f  fxW«  _  o 

13®  )f*9)  0 


as  q  — ►  0  uniformly  over  |  z  |  <  Af.  So  3  6  >  0  with  |q|  <  S  implying 
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,  l  /i(J^«)  ~  /j(*+a)  A(*)  ~  A(*)\  n  .  . 

!»  (*H — - 7^r~ )IL<< 


1.  Lemma  1  of  Beran 


(i)  ^  3q(x)  =  j  fg  1^3(/1— /3) ,  which  is  continuous  in  8  V  x  c  supp  /^. 


(ii)  We  need  to  show  ||  a^||  is  continuous.  We  will  show  the  stronger  condition,  that  bg 
is  L3  continuous. 


First  note 


J  b q  =  5  I{8)  <  oo,  so  that  ig  e  La 


We  now  compare  bg  and  *0+^8' 


J  (*•  ~  W)’  =  (A*)s  / 


V?+a  e(^8 +  J4+A0) 


<  (A0)2  J  (A-/j)4 


tye+Ad's 


<  (a^)j  /  A+&  &.-&>? 

'  ;e  h+A8  Je 
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*  2  (i +  nb)(?+A3  +  i-(0+A9))  /  {^i^~ 

which  converges  to  zero  as  A 0  — *  0. 

2.  Lemma  2  of  Beran: 

(i)  Sg  =  —  i  fg  3^3  (/i— /2)ai  which  is  continuous  in  9  V  x  c  supp  fg. 

(ii)  To  show  s g  c  L2  and  ||  jg||  is  continuous,  we  will  show  that  'ig  is  in  fact 
L2  continuous. 

First  note 


Next  we  argue  that  a  is  L2  continuous: 


9+A9 


From  here,  one  may  proceed  as  in  Lemma  1  part  (ii). 


3.  0  =  tlj9)  e  (0,1),  since  0  e  (0,1)  . 

4-  /**4/3  =  KTl<*>- 
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Table  3.1  Simulation  Results  for  Mixtures  of  Normal  Components 
With  Only  p  Unknown 


Sample  Size  =  100 
Number  of  Samples  =  500 


.10  Overlap 


.03  Overlap 


Ratio  of 


p 

Scale  Factors  (a) 

Estimator 

Bias 

nMSE 

E 

Bias 

nMSE 

E 

.25 

1 

MHDE 

.011 

.297 

1.04 

-.000 

.212 

.99 

MLE 

-.003 

.310 

-.002 

.209 

.50 

1 

MHDE 

.000 

.309 

1.11 

.003 

.281 

1.00 

MLE 

.000 

.343 

.002 

.280 

.25 

42 

MHDE 

.010 

.311 

.96 

.000 

.207 

.98 

MLE 

.001 

.299 

-.001 

.203 

.50 

42 

MHDE 

.002 

.315 

1.05 

.003 

.302 

.99 

MLE 

-.002 

.332 

.001 

.300 

.75 

42 

MHDE 

-.010 

.297 

1.07 

-.000 

.216 

1.03 

MLE 

-.002 

.319 

-.001 

.222 

Table  4.1  Simulation  Results  for  Mixtures  of  Normal  Components 
With  All  5  Parameters  Unknown 


Sample  Size  =  100 
Number  of  Samples  =  500 


Ratio  of 


.10  Overlap 


.03  Overlap 


p 

Scale  Factors  (a) 

Estimator 

Bias 

nMSE 

E 

Bias 

nMSE 

E 

.25 

1 

MHDE 

.064 

4.723 

1.06 

.006 

.435 

1.03 

MCVMDE 

.142 

8.944 

.56 

.028 

1.029 

.44 

MLE 

.063 

5.003 

.088 

.449 

.50 

1 

MHDE 

.009 

2.733 

1.16 

.005 

.403 

1.02 

MCVMDE 

-.009 

3.683 

.86 

.004 

.440 

.94 

MLE 

.007 

3.158 

.004 

.412 

.25 

42 

MHDE 

-.006 

2.005 

1.06 

-.003 

.383 

1.25 

MCVMDE 

.080 

5.228 

.40 

.019 

.831 

.58 

MLE 

-.005 

2.117 

.005 

.479 

.50 

42 

MHDE 

-.021 

2.005 

1.29 

-.006 

.376 

1.07 

MCVMDE 

.005 

2.951 

.88 

-.000 

.393 

1.02 

MLE 

-.014 

2.584 

-.002 

.402 

.75 

42 

MHDE 

-.073 

4.660 

1.07 

-.003 

.396 

1.29 

MCVMDE 

-.119 

7.742 

.64 

-.022 

1.020 

.50 

MLE 

-.077 

4.993 

-.002 

.512 

Table  4.2  Simulation  Results  for  Mixtures  of  <(4)  Components 
With  All  5  Parameters  Unknown 


Sample  Size  =  100 
Number  of  Samples  =  500 


.10  Overlap 


.03  Overlap 


p 

Ratio  of 
Scale  Factors  (a) 

Estimator 

Bias 

nMSE 

E 

Bias 

nMSE 

E 

.25 

1 

MHDE 

.056 

4.862 

1.18 

n 

.297 

2.77 

MCVMDE 

.066 

4.144 

1.38 

SH 

.428 

1.92 

MLE 

.069 

5.725 

.035 

.823 

.50 

1 

MHDE 

.002 

3.489 

1.56 

.000 

.314 

1.51 

MCVMDE 

.003 

1.855 

2.94 

.001 

.301 

1.57 

MLE 

.024 

5.457 

.003 

.473 

.25 

45 

MHDE 

.076 

4.348 

1.17 

.014 

.404 

2.48 

MCVMDE 

.095 

4.968 

1.02 

.031 

.652 

1.54 

MLE 

.090 

5.080 

.046 

1.003 

.50 

42 

MHDE 

.039 

3.300 

1.52 

-.003 

.250 

1.82 

MCVMDE 

.025 

1.978 

2.54 

-.000 

.254 

1.80 

MLE 

.024 

5.030 

.009 

.456 

.75 

42 

MHDE 

-.031 

4.780 

1.77 

-.012 

.273 

1.90 

MCVMDE 

-.055 

4.045 

2.10 

-.019 

.396 

1.31 

MLE 

-.078 

8.483 

-.014 

.519 

Table  4.3  Simulation  Results  for  Mixtures  of  1(2)  Components 
With  All  5  Parameters  Unknown 


Sample  Size  =  100 
Number  of  Samples  =  500 


.10  Overlap 


.03  Overlap 


Ratio  of 


p 

Scale  Factors  (a) 

Estimator 

Bias 

nMSE 

E 

Bias 

nMSE 

E 

.25 

1 

MHDE 

.123 

6.996 

1.14 

mm 

.257 

6.05 

MCVMDE 

.079 

3.745 

2.13 

win 

.328 

4.74 

MLE 

.097 

7.962 

.069 

1.555 

.50 

1 

MHDE 

-.007 

4.547 

2.20 

-.002 

.285 

2.96 

MCVMDE 

.006 

1.172 

8.55 

-.002 

.282 

2.99 

MLE 

-.003 

10.016 

.004 

.843 
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Figure  1 :  Normal  Probability  Plots  of  MLE  and  MHDE  Estimates 
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